Spatially selective transformation of a spatially varying optical characteristic of an image in an array of pixels

ABSTRACT

A method and system for selectively transforming a spatially varying optical characteristic (F) of an image in a pixel array. The pixel array is segmented into stripes of contiguous rows. A two-dimensional convolution C(x, y) of F is determined at only selected pixels (x, y). C(x, y) is a function of a product of a horizontal kernel h(x) and a vertical kernel v(y). Determining C(x, y) at each selected pixel (x, y) includes determining n vertical convolutions, wherein each vertical convolution is equal to a scalar product of F and v(y) in a kernel space surrounding (x,y), forming an array (V) from the n vertical convolutions, and computing C(x,y) as a scalar product of V and a constant horizontal vector (H) formed from h(x). The stripes are collected to form a transformed image which is stored and/or displayed. A cache facilitates selective reuse of vertical convolutions for determining C(x,y).

FIELD OF THE INVENTION

The present invention provides a method and system for selectivelytransforming a spatially varying optical characteristic of an image inan array of pixels.

BACKGROUND OF THE INVENTION

There is currently a sustained convergence of multimedia applicationsand multi-core processors (e.g. next generation game consoles). Theonset of multi-core architecture changes significantly programmingsemantics into parallel programming, rendering existing techniquesunsuitable. Therefore, the convergence requires novel techniques toexploit the benefits of the new features of multi-core technology.

One of the basic operations widely used in image processing and othermultimedia applications is convolution. It is used in a wide variety ofapplications in image processing such as image filtering, image codingand compression, image analysis and evaluation and computing shapedescriptors for objects. It is also used for analysis and computation inmany fields of mathematics and engineering.

In processing an image defined on an array of pixels, convolutions maybe selectively required at some pixels but not at other pixels.Unfortunately, current techniques for computing convolutions selectivelyat the pixels of an image do not sufficiently exploit this selectivityand thus lack efficiency.

Accordingly, there is a need for a method and system that computesconvolutions selectively at pixels of an image in a manner thatsufficiently exploits this selectivity to improve the efficiency ofcomputing the convolutions.

SUMMARY OF THE INVENTION

The present invention provides a method for selectively transforming aspatially varying optical characteristic (F) of an image in an array ofpixels, said array characterized by NY rows of pixels oriented in an Xdirection and NX columns of pixels oriented in a Y direction, said NXand NY each at least 5, said optical characteristic denoted as F(x,y)such that x and y are indexes of pixels in the X and Y directions, saidmethod implemented by execution of instructions by a processor of acomputer system, said instructions being stored on computer readablestorage media of the computer system, said method comprising:

segmenting the array of pixels into at least one stripe, wherein eachstripe consists of a contiguous sequence of rows of the array of pixels;

transforming the rows of each stripe, said transforming comprisingdetermining a two-dimensional convolution C(x, y) of F at only selectedpixels (x, y) in each stripe, wherein C(x, y) is a function of a productof a horizontal kernel h(x) and a vertical kernel v(y), said h(x) andv(y) each of dimension n≧3, wherein said determining C(x, y) at eachselected pixel (x, y) in each row of each stripe comprises: determiningn vertical convolutions such that each vertical convolution is equal toa scalar product of F and v(y) in a kernel space surrounding (x,y),forming an array (V) of dimension n from the n vertical convolutions,and computing C(x,y) as a scalar product of V and a constant horizontalvector (H) formed from h(x);

after said transforming, collecting the stripes to form a transformedimage;

storing and/or displaying the transformed image,

wherein a current row of a current stripe of the at least one stripecomprises a contiguous ordered sequence of a selected pixel (S1) atwhich C(x, y) is determined, u unselected pixels at which C(x, y) is notdetermined such that 1≦u≦(n−2), and a selected pixel (S2) at which C(xy) is determined after C(x, y) at S1 is determined.

The present invention provides a computer program product, comprising acomputer readable storage medium having a computer readable program codestored therein, said computer readable program code containinginstructions that when executed by a processor of a computer systemimplement a method for selectively transforming a spatially varyingoptical characteristic (F) of an image in an array of pixels, said arraycharacterized by NY rows of pixels oriented in an X direction and NXcolumns of pixels oriented in a Y direction, said NX and NY each atleast 5, said optical characteristic denoted as F(x,y) such that x and yare indexes of pixels in the X and Y directions, said method comprising:

segmenting the array of pixels into at least one stripe, wherein eachstripe consists of a contiguous sequence of rows of the array of pixels;

transforming the rows of each stripe, said transforming comprisingdetermining a two-dimensional convolution C(x, y) of F at only selectedpixels (x, y) in each stripe, wherein C(x, y) is a function of a productof a horizontal kernel h(x) and a vertical kernel v(y), said h(x) andv(y) each of dimension n≧3, wherein said determining C(x, y) at eachselected pixel (x, y) in each row of each stripe comprises: determiningn vertical convolutions such that each vertical convolution is equal toa scalar product of F and v(y) in a kernel space surrounding (x,y),forming an array (V) of dimension n from the n vertical convolutions,and computing C(x,y) as a scalar product of V and a constant horizontalvector (H) formed from h(x);

after said transforming, collecting the stripes to form a transformedimage;

storing and/or displaying the transformed image,

wherein a current row of a current stripe of the at least one stripecomprises a contiguous ordered sequence of a selected pixel (S1) atwhich C(x, y) is determined, u unselected pixels at which C(x, y) is notdetermined such that 1≦u≦(n−2), and a selected pixel (S2) at which C(xy) is determined after C(x, y) at S1 is determined.

The present invention provides a computer system comprising a processorand a computer readable memory unit coupled to the processor, saidmemory unit containing instructions that when executed by the processorimplement a method for selectively transforming a spatially varyingoptical characteristic (F) of an image in an array of pixels, said arraycharacterized by NY rows of pixels oriented in an X direction and NXcolumns of pixels oriented in a Y direction, said NX and NY each atleast 5, said optical characteristic denoted as F(x,y) such that x and yare indexes of pixels in the X and Y directions, said method comprising:

segmenting the array of pixels into at least one stripe, wherein eachstripe consists of a contiguous sequence of rows of the array of pixels;

transforming the rows of each stripe, said transforming comprisingdetermining a two-dimensional convolution C(x, y) of F at only selectedpixels (x, y) in each stripe, wherein C(x, y) is a function of a productof a horizontal kernel h(x) and a vertical kernel v(y), said h(x) andv(y) each of dimension n≧3, wherein said determining C(x, y) at eachselected pixel (x, y) in each row of each stripe comprises: determiningn vertical convolutions such that each vertical convolution is equal toa scalar product of F and v(y) in a kernel space surrounding (x,y),forming an array (V) of dimension n from the n vertical convolutions,and computing C(x,y) as a scalar product of V and a constant horizontalvector (H) formed from h(x);

after said transforming, collecting the stripes to form a transformedimage;

storing and/or displaying the transformed image,

wherein a current row of a current stripe of the at least one stripecomprises a contiguous ordered sequence of a selected pixel (S1) atwhich C(x, y) is determined, u unselected pixels at which C(x, y) is notdetermined such that 1≦u≦(n−2), and a selected pixel (S2) at which C(xy) is determined after C(x, y) at S1 is determined.

The present invention provides a method and system that computesconvolutions selectively at pixels of an image in a manner thatsufficiently exploits a spatial selectivity associated with selectivelycomputing convolutions at some pixels but not at other pixels, whichimproves the efficiency of computing the convolutions.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a pixel array in which convolutions are computed with spatialselectivity, in accordance with embodiments of the present invention.

FIG. 2 is a graphical illustration of how a two-dimensional convolutionmay be computed with a separable kernel, in accordance with embodimentsof the present invention.

FIG. 3 is a system block diagram for generating a convoluted image inthe pixel array of FIG. 1, in accordance with embodiments of the presentinvention.

FIG. 4 is a flow chart for generating a convoluted image in the pixelarray of FIG. 1, in accordance with embodiments of the presentinvention.

FIG. 5 depicts three stripes of the input image overlapped by the heightof a convolution matrix, in accordance with embodiments of the presentinvention.

FIG. 6 is a flow chart for determining a two-dimensional convolution ata selected pixel in a stripe, in accordance with embodiments of thepresent invention.

FIG. 7 is a flow chart describing computation in parallel of the twodimensional convolution at the pixels of each stripe using a knownconvolution kernel, in accordance with embodiments of the presentinvention.

FIG. 8 depicts the structure of a stripe, in accordance with embodimentsof the present invention.

FIG. 9 depicts a current row of a stripe and the submatrices associatedwith the current row, in accordance with embodiments of the presentinvention.

FIG. 10 depicts buffers used for processing the current row, inaccordance with embodiments of the present invention.

FIG. 11 is a flow chart describing a determination and storage ofvertical convolutions in a cache for a current pixel being processed, inaccordance with embodiments of the present invention.

FIG. 12 is an example illustrating the determination and storage ofvertical convolutions in a cache for a current pixel being processed, inaccordance with embodiments of the present invention.

FIG. 13 illustrates a computer system used for transforming a spatiallyvarying optical characteristic of an image in an array of pixels, inaccordance with embodiments of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 is a pixel array 10 in which convolutions are computed withspatial selectivity, in accordance with embodiments of the presentinvention. The pixel array 10 is an array of pixels comprising rows andcolumns of pixels. The rows are oriented in the X direction and stackedin the Y direction. The columns are oriented in the Y direction andstacked in the X direction. The total number of columns and the totalnumber of rows of the pixel array are denoted as NX and NY,respectively. Thus the column numbers of the pixel array are 1, 2, . . ., NX and the row numbers of the pixel array are 1 , 2, . . . , NY. NXand NY are each at least 5. In FIG. 2, NX=NY=9, and the pixel array 10comprises 81 pixels. The X and Y directions will be labeled ashorizontal and vertical directions, respectively, for convenience only.However, the equations and algorithms of the present invention areinvariant to interchanging X and Y. Therefore, the terms “horizontal”and “vertical” are intended to convey the concept of the X and Ydirections being mutually perpendicular, but do not imply that the X orY directions have any particular orientation with respect to a vectorthat is normal to the earth's surface.

Given two matrices F(x, y) and K(x, y) such that K comprises n×nelements, a two-dimensional convolution C(x, y) is given by:

$\begin{matrix}{{C\left( {x,y} \right)} = {{F\left( {x,y} \right)} \otimes {K\left( {x,y} \right)}}} \\{= {\sum\limits_{i = {x - {{floor}{({{({n - 1})}/2})}}}}^{x + {{ceiling}{({{({n - 1})}/2})}}}\; {\sum\limits_{j = {y - {{floor}{({{({n - 1})}/2})}}}}^{y + {{ceiling}{({{({n - 1})}/2})}}}\; {{F\left( {i,j} \right)} \cdot {K\left( {{{x - i}},{{y - j}}} \right)}}}}}\end{matrix}$

wherein i and j respectively denote pixel indexes in the X and Ydirections, and wherein x and y respectively denote a pixel index in theX and Y direction at which the two-dimensional convolution C(x, y) iscalculated.

The matrix F denotes a spatially varying optical characteristic (F) ofan image in the pixel array 10. The optical characteristic may comprise,inter alia, an image intensity, luminance, etc.

The convolution C(x, y) essentially computes the amount of overlapbetween the matrix F and the kernel K. One of the applications iscomputing local intensity or luminance of an image at the pixel (x, y),which may be achieved by setting F to be the image intensity orluminance, and K to be a weight matrix, wherein K may be derived from atwo-dimensional Gaussian surface. The convolution C(x, y) computes aweighted average of pixel values over a kernel space with the pixel at(x, y) having highest weight.

In one embodiment, C(x, y) is determined efficiently and selectively (atcertain points), using a single processor or using multi-coreprocessors. ‘Selectively’ means that applications might not necessarilyneed to compute convolution values at all pixels of the image, but mayrequire convolution values only at selected pixels. The selected pixelsmay be determined in any manner, such as from user input, defaultselected pixels, pixels outputted by a computer code, etc. For example,an image application may be only concerned with computing intensities ofcertain irregular image regions. That may be beneficial when memory islimited, which is typical in embedded multi-core processors. Suchselective processing reduces memory requirement of the convolutionprocess.

The kernel K(x, y) is separable if the kernel can be written as theproduct of two one-dimensional kernels; i.e.,

K(x,y)=h(x)·v(y)

wherein h(x) is a horizontal kernel that is a function only of x, andwherein v(y) is a vertical kernel that is a function only of y. If thekernel K(x, y) is separable, then computing C(x, y) reduces to thefollowing simplified form:

${C\left( {x,y} \right)} = {\sum\limits_{i = {x - {{floor}{({{({n - 1})}/2})}}}}^{x + {{ceiling}{({{({n - 1})}/2})}}}{{h\left( {{x - i}} \right)}\; {\sum\limits_{j = {y - {{floor}{({{({n - 1})}/2})}}}}^{y + {{ceiling}{({{({n - 1})}/2})}}}\; {{F\left( {i,j} \right)} \cdot {v\left( {{y - j}} \right)}}}}}$

In the preceding formula for C(x, y), vertical convolutions V(i,y) arecomputed as the scalar product Σ_(j)F(i,j)v(|y−j|). Then C(x, y) iscomputed as the scalar product (also called the dot product) of h(|x−i|)and V(i,y), wherein h(|x−i|) is a constant horizontal vector denoted asH. Thus computing C(x, y) requires computing n vertical convolutionsV(i₁,y), V(i₂,y), . . . , V(i_(n),y) which form an array V, followed bycomputing C(x, y) as the scalar product of H and V, whereini₁=x−floor((n−1)/2), i₂=i₁+1, . . . ,i_(n)=i_(n-1)+1=x+ceiling((n−1)/2).

FIG. 2 is a graphical illustration of how the preceding two-dimensionalconvolution C(x, y) may be computed with a separable kernel, inaccordance with embodiments of the present invention. FIG. 2 depicts asubmatrix of F defining a 5×5 array (n=5) used for computing C(x, y) atthe point (x, y). The scalar product computed between the verticalkernel v(y) and each of the 5 columns (11-15) of the submatrix is avertical convolution, resulting in 5 respective scalar products(graphically represented by reference numerals 21-25) which form a5-element horizontal array 30. A constant horizontal vector 20 formedfrom h(x) (more specifically, from h(|x−i|)) and denoted as H is used tocompute the scalar product between the array 30 (V) and the constanthorizontal vector 20 (H), resulting in C(x, y) at pixel (x,y).

A benefit of using the preceding simplified formula for C(x, y) with aseparable kernel is that computing the convolution C(x+1,y) at pixel(x+1,y) merely requires computing one new vertical convolution, usingthe first column to the right of column 15 (not shown), and onehorizontal convolution, and the last n−1 vertical convolutions (22-25)are reused. With multiple processors operating in parallel, computingthe convolutions C(x, y) at different stripes of the image may beperformed in parallel for added efficiency.

The present invention computes convolutions for selected pixels ratherthan for all pixels of the pixel array 10. A key to achieving thisselectivity efficiently is exploiting the spatial locality associatedwith the convolutions and spatial locality within the selected pixels,by transiently saving previously determined vertical convolutions forpotential reuse in the subsequent computation of the two-dimensionalconvolutions at other pixels on the same row. The present invention usesa novel convolution cache to exploit this spatial locality withnegligible overhead. The cache allows reuse of previously computedvertical convolution values, which increasing performance and decreasespower and memory requirements. That is achieved without needing tocompute convolutions at all pixels in the image.

The present invention uses a simple and highly regular convolutionoperation, which allows the matrix to be easily segmented amongdifferent processors prior to being convoluted. Each processorindependently convolutes a stripe. Moreover, the convolution may be ahighly data-parallel operation, which makes it amenable to vectorizationprocessing.

The present invention includes a novel memory management scheme that isapplicable to multi-core processors that rely on software memorymanagement (such as the Cell BE processor). In one embodiment, thememory management utilizes quad-buffers that allow for overlappedcomputation and memory access operation, without requiring large buffersizes.

The present invention has the following advantages over prior art. Theinvention has significantly less memory requirement than prior art,which makes the present invention more suitable for embedded processors.The present invention operates faster than prior art when spatiallyselective convolutions are required (as in High dynamic Range (HDR)photographic tone mapping operator). The present invention uses lesspower, which additionally makes the present invention suitable forembedded processors. The invention is highly scalable and vectorizableon novel multi-core processors. The invention has a novel softwarememory management scheme that makes it applicable to low-powermulti-core designs that relies on software to manage its memoryhierarchy (such as the Cell BE processor).

In one embodiment, a multi-core processor is deployed on the Cell BEmicroprocessor (IBM, Cell Broadband Engine Architecture, October 2006,Version 1.01.), which is a low-power, heterogeneous multi-core design.Cell BE has one PowerPC processor element (PPE) and 8 SynergisticProcessing Elements (SPE). Each SPE has a local memory which is softwaremanaged and optimized for stream processing.

FIG. 3 is a system block diagram for generating a convoluted image inthe pixel array of FIG. 1, in accordance with embodiments of the presentinvention. The convoluted image matrix is configured to store a twodimensional convolution at each pixel of the pixel array 10 of FIG. 1.FIG. 3 also shows data flow among system components on a Cell BEprocessor. The data flow is illustrated by the flow chart in FIG. 4,described infra.

In FIG. 3, Image Matrix (C1) is the input image matrix stored in mainmemory.

Segmentor/Dispatcher (C2) divides the image into ‘overlapped’ stripesand dispatches each stripe into an SPE; the components run on the PPE.

Submatrix Buffers (C3) include four input submatrix buffers. Threebuffers hold image matrix submatrices, and the fourth buffer is filledby the memory manager (C4) which manages the filling of the submatrixbuffers in a circular fashion.

Convolution Cache (C5) comprises CACHE_SIZE entries (powers of two) anda variable that holds a tag of the last saved entry. The cache is usedto hold vertical convolution values. The entries are inserted in acircular fashion.

Vertical Convolution Engine (C6) performs a one dimensional (1D)convolution, with the convolution kernel, in the vertical direction (Y).

Convolution Kernel (C7) is the ID convolution kernel due to theseparability of C(x, y).

Horizontal Convolution Engine (C8) performs a 1D convolution, with theconvolution kernel, in the horizontal direction (X).

Convolution Buffers (C9) comprise two output buffers. A currentprocessed row is stored in one of the two buffers while the other row iswritten to memory. Then the buffers are interchanged.

Collector (C10) collects rows from SPEs and stores the rows into memory,forming the convoluted image.

Convoluted Image Values (C10) are convoluted image values which areoutputted by the system.

FIG. 4 is a flow chart for generating the convoluted image in the pixelarray of FIG. 1, in accordance with embodiments of the presentinvention. FIG. 4 comprises steps 31-36.

In step 31, the pixel array 10 for F is inputted and stored into mainmemory (C1).

Step 32 adds rows containing zero to the top and bottom of the pixelarray 10. The number of rows added to the top and bottom of the pixelarray 10 is set equal to floor(n/2) and ceiling(n/2), respectively,wherein n is the height (in pixels) of the convolution kernel in the Ydirection, the function floor(n/2) is largest integer less than or equalto n/2, and the function ceiling(n/2) is the smallest integer greaterthan or equal to n/2.

Step 33 segments the pixel array 10 into one stripe or a plurality ofstripes, wherein each stripe consists of a contiguous sequence of rowsof the pixel array 10. Each horizontal stripe is dispatched to one SPEunder control of the Segmentor/dispatcher component (C2). The stripesare overlapped with a height (n) of the convolution kernel as shown inFIG. 6.

FIG. 5 depicts three stripes (Stripe 1, Stripe 2, Stripe 3) of the inputarray of pixels 10 overlapped by the height (n) of the kernel, inaccordance with embodiments of the present invention.

Step 34 of FIG. 4 transforms the rows of each stripe, includingdetermining a two-dimensional convolution C(x, y) of F at only selectedpixels (x, y) in each stripe. FIG. 6 describes determining C(x, y) at aselected pixel (x, y) in greater detail.

In step 34 of FIG. 4, each SPE computes (e.g., in parallel) the twodimensional convolution at the pixels of each dispatched stripe using aconvolution kernel, as described infra in detail in FIG. 7. In oneembodiment, each stripe comprises about a same number of rows, whichoptimizes the processing efficiency due to parallel processing (i.e.,simultaneous processing) of the stripes by the respective SPEs in step34. “About a same number of rows” means that the number of rows in eachstripe cannot differ by more than 1 row from the number of rows in anystripe.

Step 35 collects the stripes from the SPEs to form a transformed imageof the convoluted stripes.

Step 36 stores the transformed image into computer readable storageand/or displays the transformed image on a display device.

FIG. 6 is a flow chart for determining the two-dimensional convolutionC(x, y) of F at a selected pixel (x, y) in a stripe, in accordance withembodiments of the present invention. FIG. 6 comprises steps 37-39.

Step 37 determines n vertical convolutions such that each verticalconvolution is defined as a scalar product of F and v(y) in an n x nkernel space surrounding the selected pixel (x, y).

Step 38 forms the array (V) of dimension n from the n verticalconvolutions determined in step 37.

Step 39 computes C(x, y) as a scalar product of V and the constanthorizontal vector (H) formed from h(x).

FIG. 7 is a flow chart describing computation (e.g., in parallel), byeach SPE, of the two dimensional convolution at the pixels of eachstripe using a known convolution kernel, in accordance with embodimentsof the present invention. FIG. 7 comprises steps 41-53.

Step 41 initiates an outer loop, wherein each iteration of the outerloop scans a row in the stripe.

FIG. 8 depicts the structure of a stripe, in accordance with embodimentsof the present invention. The processed rows of the stripe are disposedbetween fk2 not processed rows at the top of the stripe and ck2 notprocessed rows at the bottom of the stripe, wherein fk2=floor(n/2) andck2=ceiling(n/2). The first row processed 37 in the first iteration isat the top of the stripe.

Step 42 of FIG. 7 divides the current row into consecutive n×nsubmatrices such that the current row bisects the submatrices (as shownin FIG. 9), wherein the kernel size n is a positive integer of at least3. Each submatrix stores values of F(i,j). The height and width of eachsubmatrix is set equal to the kernel size n. The variable ‘outbuffer’,which is set to zero, points to the convolution buffer (C9) to saveconvoluted values C(x, y) into.

FIG. 9 depicts a current row of a stripe and the submatrices associatedwith the current row, in accordance with embodiments of the presentinvention.

In step 43 of FIG. 7, the memory manager (C4) clears buffer 0 (i.e.,stores zeros into buffer 0) and loads the first submatrix (in thecurrent row) into buffer 1 (C3).

In step 44, the variable ‘curbuffer’ is set to 1 and the memory manager(C4) initiates loading the next submatrix into buffer 2 (C3). Suchloading is nonblocking, allowing the following steps (45-51) to proceedwithout waiting for the load to finish.

Step 45 initiates an inner loop in which each iteration processes a nextsubmatrix of the current row.

Step 46 sets ‘nextbuffer’ to (curbuffer+1) mod 4. The variablenextbuffer points to the next buffer to load (among four buffers: 0, 1,2, 3). Each next buffer is determined in a circular manner through useof modulo arithmetic.

Step 47 waits until the buffer pointed to by nextbuffer is fetched. Thefollowing steps (48-51) require that for a current block to process, aprevious and a next block need to resides in submatrix buffers (C3) SPElocal memory.

Step 48 sets ‘next_sch_buffer’ to (nextbuffer+1) mod 4; i.e.,next_sch_buffer points to the next buffer to fill.

Step 49 initiates fetching the next submatrix into the buffer pointed toby next_sch_buffer unless the last submatrix of the current row has beenprocessed, in which case the buffer pointed to by next_sch_buffer iscleared (i.e., all entries are set to zero).

Step 50 selectively calculates convolution(s) for the current submatrixat selected pixels. The current submatrix comprises n elements along thecurrent row respectively corresponding to n subcolumns of the submatrix.The two-dimensional convolution is computed selectively at onlyspecified elements within the submatrix's current row. Each computedconvolution is stored in convolution buffer outbuffer. Step 50 isdescribed infra in detail in conjunction with FIGS. 11-12.

Step 51 sets curbuffer to (curbuffer+1) mod 4. Then if all submatricesof the current row have not been processed, the procedure loops back tostep 45 to process the next submatrix for the curent row in a nextiteration of the inner loop; otherwise the inner loop is exited and step52 of the outer loop is next executed.

Step 52 waits for the buffer pointed to by outbuffer to be written (ifscheduled); then writing the buffer pointed to by outbuffer isinitiated.

Step 53 sets outbuffer to (outbuffer+1) mod 2. Then if all rows have notbeen processed, the procedure loops back to step 41 to process the rowin a next iteration of the outer loop; otherwise the outer loop isexited the procedure of FIG. 7 stops.

FIG. 10 depicts buffers used for processing the current row inconjunction with step 50, in accordance with embodiments of the presentinvention. FIG. 10 depicts successive submatrices 61-65. Pixel 60 insubmatrix 62 is being processed in the current row at horizontal pixelindex (x, y). As the processing of position 60 is initiated, submatrices61, 63, and 64 have already been loaded into buffers. Submatrix 65 is ina currently filled buffer. The vertical line 66 divides buffer 62 into aleft part adjacent to buffer 61 and a right part adjacent to buffer 63.Since the position 60 is in the right part of buffer 62, the procedurewill read values of F from submatrix 63 or 64 to the right of submatrix62. If the position 60 were in the left part of buffer 62, the procedurewill read values of F from submatrix 61 to the left of submatrix 62.

More specifically, both the buffer number of the buffer to be read(B_(R)) and the horizontal offset (OFFSET) within buffer B_(R) definethe column of values of F to be read and then used for computing thetwo-dimensional convolution at position 60. Recalling that n denotes thesubmatrix width, B_(R) and OFFSET are calculated via:

B _(R)=floor(x/n); and

OFFSET=x mod n.

If each buffer has horizontal positions 1, 2, . . . , then thehorizontal position within the buffer corresponding to OFFSET is1+OFFSET. For example, if n=5 and x=13, then B_(R)=2 and OFFSET=3 and(1+OFFSET)=4, so that the next column of values of F will be read fromhorizontal position 4 within buffer 2.

FIG. 11 is a flow chart describing a determination and storage ofvertical convolutions in a cache for a current pixel being processed, inaccordance with embodiments of the present invention.

FIG. 11 is a flow chart describing computation and storage of n verticalconvolution V(i,y) (i=1, 2, . . . , n) in a convolution cache ofdimension (n−1) for the current pixel (x, y) being processed, inaccordance with embodiments of the present invention. V(i,y) is computedin accordance V(i,y)=Σ_(j)F(i,j)v(|y−j|) as described supra. The cachepositions are numbered sequentially as 0, 1, 2, . . . , n−2.

Step 71 initiates a loop of n iterations for determining a verticalconvolution in each iteration i of the loop. A vertical convolution forpixel (i, y) in iteration i, wherein i=i₁, i₂, . . . , i_(n-1), i_(n)such that i₁=x−floor((n−1)/2), I₂=i₁+1, . . . , andi_(n)=i_(n-1)+1=x+ceiling((n−1)/2). The loop comprises steps 72-76.

Step 72 ascertains whether a vertical convolution for pixel (i, y) ofthe current row is in the cache. The cache consists of (n−1) storagelocations

If step 72 ascertains that the vertical convolution for pixel (i, y) ofthe current row is in the cache, then step 76 reads the verticalconvolution for pixel (i, y) from the cache. In one embodiment, step 76reads the vertical convolution for pixel (i, y) from cache position imod (n−1) since the cache consists of (n−1) positions. After step 76, ifall n vertical convolutions have not been determined, then the procedureloops back to step 71 to perform the next iteration of the loop for thenext vertical convolution; otherwise the loop is exited and step 77 isnext performed.

If step 72 ascertains that the vertical convolution for pixel (i, y) ofthe current row is not in the cache, then step 73 computes the verticalconvolution for pixel (i, y) as the scalar product of F and v(y) atpixel (i, y) in the kernel space surrounding pixel (x, y).

Step 74 stores the computed vertical convolution for pixel (i, y) in thecache. In one embodiment, step 74 stores the computed verticalconvolution for pixel (i, y) in the cache position i mod (n−1) since thecache consists of (n−1) positions.

In one embodiment, step 75 is performed. Step 75 sets a variable LAST tothe horizontal coordinate i pixel position (i, y).

After step 75 (or step 74 if step 75 is not performed), if all nvertical convolutions have not been determined, then the procedure loopsback to step 71 to perform the next iteration of the loop for the nextvertical convolution; otherwise the loop is exited and step 77 is nextperformed.

Step 77 forms the array V from the n vertical convolutions for use tosubsequently form the scalar product of V and the corresponding constanthorizontal vector H.

In one embodiment, step 72 is performed through use of the variable LAST(see step 75) which is initially set to a negative number (e.g., −1)before each row in each stripe is processed. Step 72 ascertains that thevertical convolution for pixel (i, y) of the current row is in the cacheif i is less than or equal to LAST; otherwise step 72 ascertains thatthe vertical convolution for pixel (i, y) of the current row is not inthe cache.

FIG. 12 is an example illustrating the determination and storage ofvertical convolutions in a cache for a current pixel being processed, inaccordance with embodiments of the present invention. FIG. 12 conformsto the procedure described in the flow chart of FIG. 11. FIG. 12presents views 56-59.

In view 56, row 5 is the current row and pixel A is the current pixelprocessed at which the two-dimensional convolution is computed and 5vertical convolutions have been determined. Pixel A is at column 5(x=5). The convolution matrix is a 5×5 square; i.e., n=5. The verticalconvolutions pertaining to pixel A are to be determined for pixels (i,5), wherein i=3, 4, 5, 6, 7 in the order of 3, 4, 5, 6, 7 in the Xdirection.

View 57 depicts the cache 28 after pixel A has been processed. The cache28 consists of 4 positions (generally, the cache consists of the n−1positions of 0, 1, 2, . . . , n−2) denoted as positions 0, 1, 2, 3. Thelast 4 vertical convolutions (c(A−1), c(A), c(A+1), c(A+2)) stored inthe cache 28 are in positions 4, 5, 6, 7, respectively. Therefore, LASTis set equal to 7, since c(A+2) for i=7 is the last pixel written in thecache 28.

View 58 shows pixel B as the next pixel to be processed at which thetwo-dimensional convolution is to be computed and 5 verticalconvolutions are to be determined. Pixel B is at column 8 (x=8). Thusthe two-dimensional convolution is not computed at x=6 and x=7 in thecurrent row. The vertical convolutions pertaining to pixel B are to bedetermined for pixels (i, 5), wherein i=6, 7, 8, 9, 10 in the order of6, 7, 8, 9, 10 in the X direction. View 59 depicts the cache 28 afterpixel B has been processed which is described as follows.

For pixel 6, i=6 is less than or equal to LAST=7. Therefore, thevertical convolution is read from the cache 28 at position i mod (n−1)=6mod 4=2. Indeed, the vertical convolution C(A+1) corresponding to i=6 isread from position 2 in the cache 28. Thus, the cache 28 remainsunchanged as a result of processing i=6.

For pixel 7, i=7 is less than or equal to LAST=7. Therefore, thevertical convolution is read from the cache 28 at position i mod (n−1)=7mod 4=3. Indeed, the vertical convolution C(A+2) corresponding to i=7 isread from position 3 in the cache 28. Thus, the cache 28 remainsunchanged as a result of processing i=7.

For pixel 8, i=8 which exceeds LAST=7. Therefore, the verticalconvolution for i=8 is computed and stored in the cache 28 at position imod (n−1)=8 mod 4=0. Thus, the computed vertical convolution for i=8,namely c(B), is stored in position 0 of the cache 28 as shown in view59. Then LAST is set to 8.

For pixel 9, i=9 which exceeds LAST=8. Therefore, the verticalconvolution for i=9 is computed and stored in the cache 28 at position imod (n−1)=9 mod 4=1. Thus, the computed vertical convolution for i=9,namely c(B+1), is stored in position 1 of the cache 28 as shown in view59. Then LAST is set to 9.

For pixel 10, i=10 which exceeds LAST=9. Therefore, the verticalconvolution for i=10 is computed and stored in the cache 28 at positioni mod (n−1)=10 mod 4=2. Thus, the computed vertical convolution fori=10, namely c(B+2), is stored in position 2 of the cache 28 as shown inview 59. Then LAST is set to 10. Note that c(A+2), which is the same asc(B−1), remains in position 3 in the cache 28 and is unchanged from theresult of processing pixel A.

FIG. 12 illustrates an embodiment in which: a current row y of a currentstripe comprises a contiguous ordered sequence of a selected pixel (S1)at which C(x, y) is determined; u unselected pixels at which C(x, y) isnot determined such that 1≦u≦(n−2); and a selected pixel (S2) at whichC(x y) is determined after C(x, y) at S1 is determined. In oneembodiment u=1, in one embodiment u=2, . . . , and in one embodimentu=n−2.

In FIG. 12 with respect to the preceding embodiment: the current row isrow 5, pixel S1 is represented by pixel A, pixel S2 is represented bypixel B, and u=2.

FIG. 13 illustrates a computer system 90 used for selectivelytransforming a spatially varying optical characteristic of an image inan array of pixels, in accordance with embodiments of the presentinvention. The computer system 90 comprises a processor 91, an inputdevice 92 coupled to the processor 91, an output device 93 coupled tothe processor 91, and memory devices 94 and 95 each coupled to theprocessor 91. The processor 91 is a processing unit such as a centralprocessing unit (CPU). The input device 92 may be, inter alia, akeyboard, a mouse, etc. The output device 93 may be, inter alia, aprinter, a plotter, a display device (e.g., a computer screen), amagnetic tape, a removable hard disk, a floppy disk, etc. The displaydevice may comprise the display area 10 of FIG. 1. The memory devices 94and 95 may be, inter alia, a hard disk, a floppy disk, a magnetic tape,an optical storage such as a compact disc (CD) or a digital video disc(DVD), a dynamic random access memory (DRAM), a read-only memory (ROM),etc. The memory device 95 includes a computer code 97 which is acomputer program that comprises computer-executable instructions. Thecomputer code 97 includes an algorithm for selectively transforming aspatially varying optical characteristic of an image in an array ofpixels. The processor 91 executes the computer code 97. The memorydevice 94 includes input data 96. The input data 96 includes inputrequired by the computer code 97. The output device 93 displays outputfrom the computer code 97. Either or both memory devices 94 and 95 (orone or more additional memory devices not shown in FIG. 13) may be usedas a computer usable storage medium (or program storage device) having acomputer readable program embodied therein and/or having other datastored therein, wherein the computer readable program comprises thecomputer code 97. Generally, a computer program product (or,alternatively, an article of manufacture) of the computer system 90 maycomprise said computer usable storage medium (or said program storagedevice).

While FIG. 13 shows the computer system 90 as a particular configurationof hardware and software, any configuration of hardware and software, aswould be known to a person of ordinary skill in the art, may be utilizedfor the purposes stated supra in conjunction with the particularcomputer system 90 of FIG. 13. For example, the memory devices 94 and 95may be portions of a single memory device rather than separate memorydevices.

While particular embodiments of the present invention have beendescribed herein for purposes of illustration, many modifications andchanges will become apparent to those skilled in the art. Accordingly,the appended claims are intended to encompass all such modifications andchanges as fall within the true spirit and scope of this invention.

1. A method for selectively transforming a spatially varying opticalcharacteristic (F) of an image in an array of pixels, said arraycharacterized by NY rows of pixels oriented in an X direction and NXcolumns of pixels oriented in a Y direction, said NX and NY each atleast 5, said optical characteristic denoted as F(x,y) such that x and yare indexes of pixels in the X and Y directions, said method implementedby execution of instructions by a processor of a computer system, saidinstructions being stored on computer readable storage media of thecomputer system, said method comprising: segmenting the array of pixelsinto at least one stripe, wherein each stripe consists of a contiguoussequence of rows of the array of pixels; transforming the rows of eachstripe, said transforming comprising determining a two-dimensionalconvolution C(x, y) of F at only selected pixels (x, y) in each stripe,wherein C(x, y) is a function of a product of a horizontal kernel h(x)and a vertical kernel v(y), said h(x) and v(y) each of dimension n≧3,wherein said determining C(x, y) at each selected pixel (x, y) in eachrow of each stripe comprises: determining n vertical convolutions suchthat each vertical convolution is equal to a scalar product of F andv(y) in a kernel space surrounding (x,y), forming an array (V) ofdimension n from the n vertical convolutions, and computing C(x,y) as ascalar product of V and a constant horizontal vector (H) formed fromh(x); after said transforming, collecting the stripes to form atransformed image; storing and/or displaying the transformed image,wherein a current row of a current stripe of the at least one stripecomprises a contiguous ordered sequence of a selected pixel (S1) atwhich C(x, y) is determined, u unselected pixels at which C(x, y) is notdetermined such that 1≦u≦(n−2), and a selected pixel (S2) at which C(xy) is determined after C(x, y) at S1 is determined.
 2. The method ofclaim 1, wherein the at least one stripe consists of a plurality ofstripes, wherein each stripe comprises about a same number of rows, andwherein said transforming the rows of each stripe is performedsimultaneously for the plurality of stripes.
 3. The method of claim 1,wherein determining C(x, y) at S1 results in n−1 vertical convolutionsof the n vertical convolutions determined at S1 being stored in aconvolution cache of dimension n−1; wherein m vertical convolutions ofthe n vertical convolutions determined for S2 is a subset of the n−1vertical convolutions stored in the cache such that m=n−1−u; whereindetermining C(x, y) at S2 comprises reading the m vertical convolutionsfrom the cache and does not comprise computing the m verticalconvolutions as the scalar product of F and v(y) in the kernel spacesurrounding S2.
 4. The method of claim 3, wherein the cache locationsare numbered sequentially as 0, 1, . . . , n−2, wherein said determiningC(x, y) at S2 comprises determining a vertical convolution for eachpixel (i, y) for i=i₁, i₂, . . . ,i_(n-1), i_(n) such thati₁=x−floor((n−1)/2), i₂=i₁+1, . . . , andi_(n)=i_(n-1)+1=x+ceiling((n−1)/2), and wherein said determining avertical convolution for each pixel (i, y) comprises: ascertainingwhether a vertical convolution for pixel (i, y) is in the cache; if saidascertaining ascertains that the vertical convolution for pixel (i, y)is in the cache, then reading the vertical convolution for pixel (i, y)from the cache; if said ascertaining ascertains that the verticalconvolution for pixel (i, y) is not in the cache, then computing thevertical convolution for pixel (i, y) as the scalar product of F andv(y) at pixel (i, y) in the kernel space surrounding S2, followed bystoring the computed vertical convolution for pixel (i, y) in the cache.5. The method of claim 4, wherein said reading the vertical convolutionfor pixel (i, y) from the cache comprises reading the verticalconvolution for pixel (i, y) from cache position i mod (n−1); whereinsaid storing the computed vertical convolution for pixel (i, y) in thecache comprises storing the computed vertical convolution for pixel (i,y) at cache position i mod (n−1).
 6. The method of claim 5, wherein ifsaid ascertaining ascertains that the vertical convolution for pixel (i,y) of the is not in the cache, then after said storing the computedvertical convolution for pixel (i, y) in the cache: setting a variableLAST to i; wherein said ascertaining comprises ascertaining that thevertical convolution for pixel (i, y) is in the cache if i is less thanor equal to LAST, otherwise ascertaining that the vertical convolutionfor pixel (i, y) is not in the cache.
 7. The method of claim 3, whereindetermining C(x, y) at S2 comprises reading a subcolumn of values of Ffrom a buffer (B_(R)) at a horizontal position of (1+OFFSET) withinbuffer B_(R); and wherein prior to said reading the subcolumn of valuesof F the method further comprises determining B_(R) as equal tofloor(x/n) and determining OFFSET as equal to x mod n.
 8. A computerprogram product, comprising a computer readable storage medium having acomputer readable program code stored therein, said computer readableprogram code containing instructions that when executed by a processorof a computer system implement a method for selectively transforming aspatially varying optical characteristic (F) of an image in an array ofpixels, said array characterized by NY rows of pixels oriented in an Xdirection and NX columns of pixels oriented in a Y direction, said NXand NY each at least 5, said optical characteristic denoted as F(x,y)such that x and y are indexes of pixels in the X and Y directions, saidmethod comprising: segmenting the array of pixels into at least onestripe, wherein each stripe consists of a contiguous sequence of rows ofthe array of pixels; transforming the rows of each stripe, saidtransforming comprising determining a two-dimensional convolution C(x,y) of F at only selected pixels (x, y) in each stripe, wherein C(x, y)is a function of a product of a horizontal kernel h(x) and a verticalkernel v(y), said h(x) and v(y) each of dimension n≧3, wherein saiddetermining C(x, y) at each selected pixel (x, y) in each row of eachstripe comprises: determining n vertical convolutions such that eachvertical convolution is equal to a scalar product of F and v(y) in akernel space surrounding (x,y), forming an array (V) of dimension n fromthe n vertical convolutions, and computing C(x,y) as a scalar product ofV and a constant horizontal vector (H) formed from h(x); after saidtransforming, collecting the stripes to form a transformed image;storing and/or displaying the transformed image, wherein a current rowof a current stripe of the at least one stripe comprises a contiguousordered sequence of a selected pixel (S1) at which C(x, y) isdetermined, u unselected pixels at which C(x, y) is not determined suchthat 1≦u≦(n−2), and a selected pixel (S2) at which C(x y) is determinedafter C(x, y) at SI is determined.
 9. The computer program product ofclaim 8, wherein the at least one stripe consists of a plurality ofstripes, wherein each stripe comprises about a same number of rows, andwherein said transforming the rows of each stripe is performedsimultaneously for the plurality of stripes.
 10. The computer programproduct of claim 8, wherein determining C(x, y) at S1 results in n−1vertical convolutions of the n vertical convolutions determined at S1being stored in a convolution cache of dimension n−1; wherein m verticalconvolutions of the n vertical convolutions determined for S2 is asubset of the n−1 vertical convolutions stored in the cache such thatm=n−1−u; wherein determining C(x, y) at S2 comprises reading the mvertical convolutions from the cache and does not comprise computing them vertical convolutions as the scalar product of F and v(y) in thekernel space surrounding S2.
 11. The computer program product of claim10, wherein the cache locations are numbered sequentially as 0, 1, . . ., n−2, wherein said determining C(x, y) at S2 comprises determining avertical convolution for each pixel (i, y) for i=i₁, i₂, . . . ,i_(n-1), i_(n) such that i₁=x−floor((n−1)/2), i_(2 =i) ₁+1, . . . , andi_(n)=i_(n-1)+1=x+ceiling((n−1)/2), and wherein said determining avertial convolution for each pixel (i, y) comprises: ascertainingwhether a vertical convolution for pixel (i, y) is in the cache; if saidascertaining ascertains that the vertical convolution for pixel (i, y)is in the cache, then reading the vertical convolution for pixel (i, y)from the cache; if said ascertaining ascertains that the verticalconvolution for pixel (i, y) is not in the cache, then computing thevertical convolution for pixel (i, y) as the scalar product of F andv(y) at pixel (i, y) in the kernel space surrounding S2, followed bystoring the computed vertical convolution for pixel (i, y) in the cache.12. The computer program product of claim 11, wherein said reading thevertical convolution for pixel (i, y) from the cache comprises readingthe vertical convolution for pixel (i, y) from cache position i mod(n−1); wherein said storing the computed vertical convolution for pixel(i, y) in the cache comprises storing the computed vertical convolutionfor pixel (i, y) at cache position i mod (n−1).
 13. The computer programproduct of claim 12, wherein if said ascertaining ascertains that thevertical convolution for pixel (i, y) of the is not in the cache, thenafter said storing the computed vertical convolution for pixel (i, y) inthe cache: setting a variable LAST to i; wherein said ascertainingcomprises ascertaining that the vertical convolution for pixel (i, y) isin the cache if i is less than or equal to LAST, otherwise ascertainingthat the vertical convolution for pixel (i, y) is not in the cache. 14.The computer program product of claim 10, wherein determining C(x, y) atS2 comprises reading a subcolumn of values of F from a buffer (B_(R)) ata horizontal position of (1+OFFSET) within buffer B_(R); and whereinprior to said reading the subcolumn of values of F the method furthercomprises determining B_(R) as equal to floor(x/n) and determiningOFFSET as equal to x mod n.
 15. A computer system comprising a processorand a computer readable memory unit coupled to the processor, saidmemory unit containing instructions that when executed by the processorimplement a method for selectively transforming a spatially varyingoptical characteristic (F) of an image in an array of pixels, said arraycharacterized by NY rows of pixels oriented in an X direction and NXcolumns of pixels oriented in a Y direction, said NX and NY each atleast 5, said optical characteristic denoted as F(x,y) such that x and yare indexes of pixels in the X and Y directions, said method comprising:segmenting the array of pixels into at least one stripe, wherein eachstripe consists of a contiguous sequence of rows of the array of pixels;transforming the rows of each stripe, said transforming comprisingdetermining a two-dimensional convolution C(x, y) of F at only selectedpixels (x, y) in each stripe, wherein C(x, y) is a function of a productof a horizontal kernel h(x) and a vertical kernel v(y), said h(x) andv(y) each of dimension n≧3, wherein said determining C(x, y) at eachselected pixel (x, y) in each row of each stripe comprises: determiningn vertical convolutions such that each vertical convolution is equal toa scalar product of F and v(y) in a kernel space surrounding (x,y),forming an array (V) of dimension n from the n vertical convolutions,and computing C(x,y) as a scalar product of V and a constant horizontalvector (H) formed from h(x); after said transforming, collecting thestripes to form a transformed image; storing and/or displaying thetransformed image, wherein a current row of a current stripe of the atleast one stripe comprises a contiguous ordered sequence of a selectedpixel (S1) at which C(x, y) is determined, u unselected pixels at whichC(x, y) is not determined such that 1≦u≦(n−2), and a selected pixel (S2)at which C(x y) is determined after C(x, y) at S1 is determined.
 16. Thecomputer system of claim 15, wherein the at least one stripe consists ofa plurality of stripes, wherein each stripe comprises about a samenumber of rows, and wherein said transforming the rows of each stripe isperformed simultaneously for the plurality of stripes.
 17. The computersystem of claim 15, wherein determining C(x, y) at S1 results in n−1vertical convolutions of the n vertical convolutions determined at S1being stored in a convolution cache of dimension n−1; wherein m verticalconvolutions of the n vertical convolutions determined for S2 is asubset of the n−1 vertical convolutions stored in the cache such thatm=n−1−u; wherein determining C(x, y) at S2 comprises reading the mvertical convolutions from the cache and does not comprise computing them vertical convolutions as the scalar product of F and v(y) in thekernel space surrounding S2.
 18. The computer system of claim 17,wherein the cache locations are numbered sequentially as 0, 1, . . . ,n−2, wherein said determining C(x, y) at S2 comprises determining avertical convolution for each pixel (i, y) for i=i₁, i₂, . . . ,i_(n-1), i_(n) such that i₁=x−floor((n−1)/2), i₂=i₁+1, . . . , andi_(n)=i_(n-1)+1=x+ceiling((n−1)/2), and wherein said determining avertical convolution for each pixel (i, y) comprises: ascertainingwhether a vertical convolution for pixel (i, y) is in the cache; if saidascertaining ascertains that the vertical convolution for pixel (i, y)is in the cache, then reading the vertical convolution for pixel (i, y)from the cache; if said ascertaining ascertains that the verticalconvolution for pixel (i, y) is not in the cache, then computing thevertical convolution for pixel (i, y) as the scalar product of F andv(y) at pixel (i, y) in the kernel space surrounding S2, followed bystoring the computed vertical convolution for pixel (i, y) in the cache.19. The computer system of claim 18, wherein said reading the verticalconvolution for pixel (i, y) from the cache comprises reading thevertical convolution for pixel (i, y) from cache position i mod (n−1);wherein said storing the computed vertical convolution for pixel (i, y)in the cache comprises storing the computed vertical convolution forpixel (i, y) at cache position i mod (n−1).
 20. The computer system ofclaim 19, wherein if said ascertaining ascertains that the verticalconvolution for pixel (i, y) of the is not in the cache, then after saidstoring the computed vertical convolution for pixel (i, y) in the cache:setting a variable LAST to i; wherein said ascertaining comprisesascertaining that the vertical convolution for pixel (i, y) is in thecache if i is less than or equal to LAST, otherwise ascertaining thatthe vertical convolution for pixel (i, y) is not in the cache.